Base and background functions
1 Base function
An interleaving algorithm was previously defined in order to increase the density of sampling points limited by the 32 pixels of the line detector. The algorithm consisted of performing a window integration (32 windows) over multiple optical fields with different \(\mu\) parameters, which was equivalent to lateral shifts. Finally, an interpolation was done over the interleaved data, hence obtaining what was defined as the base function.
This base function represents a ‘smooth surface’ that will be then compared with rougher samples. In order to obtain an experimental base function, data was collected as illustrated in Figure 1 A. A smooth silicon wafer was rotated at different angles and then the 32 raw sampling points of the line detector were acquired. The collected angles were from -1.0 to 1.0 degrees.
The following step is to sort the data points and apply the interleaving algorithm, with these, an experimental base function is obtained as illustrated in Figure 1 B.

1.1 Experimental collection
The following code shows the acquired experimental data:
Code
new_colors = []
for i in range(42):
new_colors.append('#9D6C97')
new_colors.append('#9DC3E6')
new_colors.append('#9DD9C5')
# 1. Read the Excel file into a DataFrame
df = pd.read_excel('data/base_function.xlsx', sheet_name=['base', 'M'])
# 2. Split the DataFrame into two separate DataFrames
base_df = df['base']
M_df = df['M'].sort_values(by='M')
# M_df = M_df[~M_df.isin([-0.002, 0.003]).any(axis=1)]
sorted_df = pd.DataFrame(columns=['mu','xaxis', 'yaxis', 'colors'])
# 3. Create x axis
xaxis = np.arange(-15.5, 16.5, 1)
plots = []
# 4. Iterate M dataframe
for i, (index, row) in enumerate(M_df.iterrows()):
# a. Plot raw sampling data
p = figure(title=str(f'M: {row.M}'), x_axis_label='sampling point', y_axis_label='intensity', tooltips = [("index", "$index"),("(x,y)", "($x, $y)")],
width = 200, height = 150)
new_axis = xaxis - row.M
p.line(new_axis, base_df[index], line_color='#9DD9C5', line_width=3)
p.circle(new_axis, base_df[index], size = 4)
vline = Span(location=0.0, dimension = 'height', line_color='#FEEED9', line_width=1)
p.add_layout(vline)
# b. Plot format
p.x_range = Range1d(-7, 7)
p.yaxis.ticker.desired_num_ticks = 4
p = plot_format(p, "Degrees", "Intensity", "bottom_left", "8pt", "8pt", "8pt")
plots.append(p)
# c. Create dataframe
sorted_df = sorted_df.append(pd.DataFrame({'mu':[row.M]*32,'xaxis':new_axis, 'yaxis':base_df[index], 'colors':new_colors[0:32]}), ignore_index=True)
grid_raw = gridplot(children = plots, ncols = 6, merge_tools=False)
show(grid_raw)1.2 Base function smoothing and interpolation
The experimental base function is then obtained by interleaving the acquired experimental data. Note that some of the acquired points were very close to one another, hence they were creating a ‘zig-zag’ shape in the interpolation. In order to remove this, the data was made smoother by averaging values very close to one another. The code to obtain the experimental base function is now shown:
Code
# 5. Create interleaved plots
interleaved_plot = figure(title='Interleaved base function', x_axis_label='sampling point', y_axis_label='intensity', tooltips = [("index", "$index"),("(x,y)", "($x, $y)")],
width = 700, height = 500)
smooth_plot = figure(title='Smooth base function', x_axis_label='sampling point', y_axis_label='intensity', tooltips = [("index", "$index"),("(x,y)", "($x, $y)")],
width = 700, height = 500)
interpolated_plot = figure(title='Inteporlated base function points', tooltips = [("index", "$index"),("(x,y)", "($x, $y)")],
width = 700, height = 500)
# a. Define base_function and smooth df's
# diff = base_function_df['xaxis'].diff()
# smooth_df = base_function_df[(diff >= 0.01) | (diff.isna())]
# smooth_df = smooth_df.iloc[1:]
base_function_df = sorted_df.sort_values(by='xaxis').reset_index(drop=True)
smooth_df = pd.DataFrame(data={}, columns=['xaxis', 'yaxis', 'colors'])
xoutindx=0
for aveindex in range(1, len(base_function_df)):
if (base_function_df.loc[aveindex, 'xaxis'] - base_function_df.loc[aveindex-1, 'xaxis']) < 0.01:
smooth_df.loc[xoutindx, 'xaxis'] = (base_function_df.loc[aveindex, 'xaxis'] + base_function_df.loc[aveindex-1, 'xaxis'])/2
smooth_df.loc[xoutindx, 'yaxis'] = (base_function_df.loc[aveindex, 'yaxis'] + base_function_df.loc[aveindex-1, 'yaxis'])/2
smooth_df.loc[xoutindx, 'colors'] = base_function_df.loc[aveindex, 'colors']
else:
xoutindx += 1
smooth_df.loc[xoutindx, 'xaxis'] = base_function_df.loc[aveindex, 'xaxis']
smooth_df.loc[xoutindx, 'yaxis'] = base_function_df.loc[aveindex, 'yaxis']
smooth_df.loc[xoutindx, 'colors'] = base_function_df.loc[aveindex, 'colors']
# b. Create non-smooth and smoot curve
interleaved_plot = figure(title='Interleaved base function', x_axis_label='sampling point', y_axis_label='intensity', tooltips=[("index", "$index"), ("(x,y)", "($x, $y)")],
width=700, height=500)
smooth_plot = figure(title='Smooth base function', x_axis_label='sampling point', y_axis_label='intensity', tooltips=[("index", "$index"), ("(x,y)", "($x, $y)")],
width=700, height=500)
# c. Plot points
for (plot, df, legend, color) in [(interleaved_plot, base_function_df, 'Non-smooth base function', '#9DC3E6'), (smooth_plot, smooth_df, 'Smooth base function', '#9D6C97')]:
# individual points
plot.circle(df.xaxis, df.yaxis, color=df.colors, size=6)
# smooth curve
plot.line(df['xaxis'], df['yaxis'], line_width=4, legend=legend, color=color)
# format
plot.xaxis.ticker.desired_num_ticks = 15
plot.y_range = Range1d(0, 45000)
plot = plot_format(plot, "Degrees", "Intensity", "top_left", "10pt", "10pt", "9pt")
# d. Interpolation
x_base = np.arange(-15.5, 15.5001, 0.001).round(3)
pchip = PchipInterpolator(smooth_df['xaxis'], smooth_df['yaxis'])
y_base = pchip(x_base)
interpolated_plot.line(x=smooth_df['xaxis'], y=smooth_df['yaxis'], line_width = 5, legend = 'Smooth base function', color = '#9D6C97')
interpolated_plot.line(x_base, y_base, line_width = 5, color = '#9DD9C5', legend = 'Interpolated base function')
interpolated_plot.xaxis.ticker.desired_num_ticks = 15
interpolated_plot.y_range = Range1d(-1000, 45000)
interpolated_plot = plot_format(interpolated_plot, "Degrees", "Intensity", "top_left", "10pt", "10pt", "9pt")
base_function_grid = gridplot(children=[interleaved_plot, smooth_plot, interpolated_plot], ncols=3, merge_tools=False, width=420, height=380)
show(base_function_grid)2 Background function
The experimental base function is the reference numerical function obtained from a smooth wafer at different angles. In practice the wafer roughness is to be measured for wafers with different roughness. Mathematically, this represents a change in amplitude and additional tails in the base function function. This is illustrated in Figure 2. Hence, different parameters have to be found in order to approximate the base function to the real rough experimental data.

In order to go from a smooth base function to rough data, a background function is added in order to modify the amplitude and tails. This is illustrated in Figure 3. The ‘smooth’ base function (a) is modified by adding a background function (b), e.g., a Gaussian or Lorentzian function with their corresponding amplitude, \(\sigma\) and \(\gamma\) parameters. The output of the addition will be a modified function (c). The final step is to downsample the modified (d) function and compare it with the experimental rough data by using an error function.

2.1 Experimental rough data
From the experimental data it was observed that rough samples modify the amplitude and tails of the base function, the data is now shown:
Code
from bokeh.palettes import Set3
# 1. Import data
rough_df = pd.read_excel('data/rough_samples.xlsx')
source_rough = ColumnDataSource(rough_df)
# # 2. Create plot
rough_plots = []
color_palette = Set3[len(rough_df.columns[1:])+2]
# a. iterate over the columns and add a line for each one
for i, col in enumerate(rough_df.columns[1:]):
rough_plot = figure(title = str(col), x_axis_label='xaxis', y_axis_label='yaxis', width = 350, height = 320, tooltips = [("index", "$index"),("(x,y)", "($x, $y)")])
rough_plot.line(x_base, y_base, line_width=4, color = '#9D6C97', legend_label = 'base function')
rough_plot.line('xaxis', col, source=source_rough, color = '#9DC3E6', legend_label = str(col), line_width=4)
rough_plot.circle('xaxis', col, source=source_rough, fill_color= color_palette[i], size=7, legend_label = str(col))
rough_plot.y_range = Range1d(-5000, 45000)
rough_plot = plot_format(rough_plot, "Degrees", "Intensity", "top_left", "10pt", "10pt", "10pt")
rough_plots.append(rough_plot)
grid_rough = gridplot(children = rough_plots, ncols = 3, merge_tools=False)
show(grid_rough)3 Modified function
Once the base function has been numerically defined, then the background function can be added in order to obtained a modified function that approximates the rough data. The proposed background functions are:
Gaussian: \(A\exp\left(-\frac{(x-x_0)^2}{2\sigma^2}\right)\), with parameters \(x_{o}\), \(\sigma\) and \(A\). These parameters were tuned to \(x_{o}=0\), \(\sigma=1.9\), \(A=3500\)
Lorentzian: \(A\frac{1}{1+\left(\frac{x-x_0}{\gamma}\right)^2}\), with parameters \(x_{o}\), \(\gamma\) and \(A\). These parameters were tuned to \(x_{o}=0\) and \(\gamma=2.1\), \(A=2500\)
The base function (purple) multiplied by a factor of 0.8, after this both background functions (green curve) are added in order to modify the amplitude and tails. The resulting modified function (blue) is then downsampled(brown-dashed-triangles) and compared with real experimental rough data (yellow). Notice that if you click in the plot label you can hide the data.
Code
from bokeh.palettes import Set3
color_palette = Set3[10]
# 1. Define functions
functions = [
("Gaussian", lambda x, x0, sigma: np.exp(-((x-x0)/sigma)**2/2), (0.0, 1.9, 3500, 0.8), (r'$x_0$ gaussian', r'$\sigma$ gaussian', 'amp_gaussian', 'base function amplitude 1')),
("Lorentzian", lambda x, x0, gamma: 1/(1 + ((x-x0)/gamma)**2), (0.0, 2.1, 2500, 0.8), (r'$x_0$ lorentzian', r'$\gamma$ lorentzian', 'amp_lorenzian', 'base function amplitude 2'))]
labels = ["1. ", "2. "]
equations = [
r"$\exp\left(-\frac{(x-x_0)^2}{2\sigma^2}\right)$",
r"$\frac{1}{1+\left(\frac{x-x_0}{\gamma}\right)^2}$"]
# 2. Get base function
base_function = pd.read_csv('data/base_funtion_interpolated.csv')
x_base = base_function['x_base'].copy().values.round(3)
y_base = base_function['y_base'].copy().values
x_background = base_function["x_base"].copy().values
# 3. Get rough data
rough_df = pd.read_excel('data/rough_samples.xlsx')
x_rough = rough_df["xaxis"].copy().values
# y_rough = ['ann1', 'pt2', 'pt2b', 'pt2c', 'pt2d', 'pt2e']
y_rough = ['pt2d']
columns = list(rough_df.columns)
figures = []
for j, (name, f, params_nums, params_names) in enumerate(functions):
p = figure(title = f"{labels[j]} {name}", width=750, height=450)
# 3. Shift base function axis
x_base += params_nums[0]
y_base = params_nums[-1]*base_function['y_base'].copy().values
x_background += params_nums[0]
# 4. Calculate background function
y_background = params_nums[-2]*f(x_background, *params_nums[0:-2])
y_final = y_base + y_background
# 5. Plots
# 5.1 base function plot
p.line(x_base, y_base, line_width = 5, color = '#9D6C97', legend_label = 'base_function')
vline = Span(location=0.0, dimension = 'height', line_color='#FEEED9', line_width=1)
p.add_layout(vline)
# 5.2 Background function plot
p.line(x_background, y_background, line_width = 5, color = '#9DD9C5', legend_label = 'background_function')
# 5.3 Modified function
indices = np.where(np.isin(x_base, x_rough+params_nums[0]))[0]
y_final_points = y_final[indices]
p.line(x_base, y_final, line_width = 5, legend_label = 'Base + background functions', color = '#A6DDFF', alpha = 1.0)
# 5.4 Plot format
p.xaxis.ticker.desired_num_ticks = 10
p.yaxis.ticker.desired_num_ticks = 10
p.y_range = Range1d(-5000, 45000)
p = plot_format(p, "Degrees", "Intensity", "top_left", "10pt", "10pt", "10pt")
figures.append(p)
# 5.5 Rough data plot
corr_coef = np.corrcoef(y_final_points, rough_df['pt2d'])[0,1]
p2 = figure(title = f"{name} downsampling; correlation coefficient: {corr_coef:.4f}", width=750, height=450)
p2.line(x_base, y_final, line_width = 5, legend_label = 'Base + background functions', color = '#A6DDFF', alpha = 1.0)
k = 0
for col in columns[1:]:
if col in y_rough:
p2.line(x_rough, rough_df[col], legend_label = col, line_width = 5, color=color_palette[k+1])
p2.circle(x_rough, rough_df[col], legend_label = col, size = 7, color='#5F9545')
k+=1
# 5.6 Downsampled data
p2.line(x_rough+params_nums[0], y_final_points, line_width=5, legend_label = 'Downsampling', color = '#98473E', alpha = 0.7, line_dash='dashed')
p2.triangle(x_rough+params_nums[0], y_final_points, size = 10, legend_label = 'Downsampling', color = '#DB8A74')
# 5.7 plot format
p2 = plot_format(p2, "Degrees", "Intensity", "top_left", "10pt", "10pt", "10pt")
p2.xaxis.ticker.desired_num_ticks = 10
p2.yaxis.ticker.desired_num_ticks = 10
p2.y_range = Range1d(-5000, 45000)
figures.append(p2)
grid_modified = gridplot(children = figures, ncols = 2, merge_tools=False, width=500, height = 450)
show(grid_modified)4 Minimization function
Code
from bokeh.palettes import Set3
from scipy.optimize import minimize
# 1. Import base function and rough data
base_function = pd.read_csv('data/base_funtion_interpolated.csv')
x_base = base_function['x_base'].values.round(3)
y_base = base_function['y_base'].values
rough_df = pd.read_excel('data/rough_samples.xlsx', sheet_name='Data')
source_rough = ColumnDataSource(rough_df)
guess_df = pd.read_excel('data/rough_samples.xlsx', sheet_name='Guess')
guess_df = guess_df.set_index('Variables')
# 2. Define base function
def base_function(y_base, A0):
return A0*y_base
# 3. Define gaussian function
gaussian = lambda x, x0, sigma, A1: A1 * np.exp(-((x - x0) / sigma) ** 2 / 2)
# 4. Define modified function
def modified_function(params, x=x_base):
x0, A0, sigma, A1 = params
y_base_modified = base_function(y_base, A0)
y_background = gaussian(x, x0, sigma, A1)
y_modified = y_base_modified + y_background
return y_modified
# 5. Define cost function
def cost_function(params, x, y):
y_modified = modified_function(params, x=x)
# indices = np.where(np.isin(x_base, (x_rough+x0)))[0]
indices = np.where(np.isin(x_base, (x_rough+x0)))[0]
y_modified_points = y_modified[indices]
return np.sum((y_modified_points - y) ** 2)
# 6. Initial parameters guess
x0 = 0.0
# A0 = 0.9
# sigma = 1.3
# A1 = 4000
# params = [x0, A0, sigma, A1]
bounds = ((None, None), (0, None), (0, None), (0, None))
# 7. Calculate modified function with initial guess
x_base += x0
y_modified = modified_function(params, x=x_base)
# 8. iterate over the columns and add a line for each one
rough_plots = []
color_palette = Set3[len(rough_df.columns[1:])+2]
for i, col in enumerate(rough_df.columns[1:]):
x0 = guess_df.loc['x0'][col]
Abase = guess_df.loc['Abase'][col]
sigma = guess_df.loc['sigma'][col]
Agaussian = guess_df.loc['Agaussian'][col]
params = [x0, Abase, sigma, Agaussian]
# 9. Call minimize function
y_rough = rough_df[col].copy().values
cost_fn = lambda p: cost_function(p, x_base, y_rough)
result = minimize(cost_fn, params, bounds=bounds)
optimized_parameters = result.x
print(optimized_parameters)
# 10. Calculate new optimized modified function
y_optimized = modified_function(optimized_parameters, x=x_base)
indices = np.where(np.isin(x_base, x_rough + params[0]))[0]
# indices = np.where(np.isin(x_base, (x_rough + optimized_parameters[0]).round(3)))[0]
y_optimized_points = y_optimized[indices]
# 11. Initial guess of modified function plot
rough_plot = figure(title = str(col), x_axis_label='xaxis', y_axis_label='yaxis', width = 550, height = 400, tooltips = [("index", "$index"),("(x,y)", "($x, $y)")])
rough_plot.line(x_base, y_modified, line_width=4, color = '#9D6C97', legend_label = 'initial guess')
# 12. Optimized modified function
rough_plot.line(x_base, y_optimized, legend_label = 'Optimized function', line_width = 5, color='#9DD9C5')
# 13.Rough data plot
rough_plot.line('xaxis', col, source=source_rough, color = '#9DC3E6', legend_label = str(col), line_width=4)
rough_plot.circle('xaxis', col, source=source_rough, fill_color= color_palette[i], size=7, legend_label = str(col))
# 14. Optimized modified downsampled points
rough_plot.line(x_rough+x0, y_optimized_points, line_width=5, legend_label = 'Downsampling', color = '#98473E', alpha = 0.7, line_dash='dashed')
rough_plot.triangle(x_rough+x0, y_optimized_points, size = 9, legend_label = 'Downsampling', color = '#DB8A74')
# rough_plot.line(x_rough+optimized_parameters[0], y_optimized_points, line_width=5, legend_label = 'Downsampling', color = '#98473E', alpha = 0.7, line_dash='dashed')
# rough_plot.triangle(x_rough+optimized_parameters[0], y_optimized_points, size = 9, legend_label = 'Downsampling', color = '#DB8A74')
# plot format
rough_plot.y_range = Range1d(-5000, 50000)
rough_plot.xaxis.ticker.desired_num_ticks = 10
rough_plot.yaxis.ticker.desired_num_ticks = 10
rough_plot = plot_format(rough_plot, "Degrees", "Intensity", "top_left", "10pt", "10pt", "10pt")
rough_plots.append(rough_plot)
grid_rough = gridplot(children = rough_plots, ncols = 3, merge_tools=False)
show(grid_rough)[-7.58741127e-02 9.05659441e-01 1.30316633e+00 3.99999994e+03]
[0.1481848 0.99458571 1.38493068 0. ]
[ -1.20930938 0.96724052 1.13247649 863.71353654]
[-4.39494896e-01 9.03629079e-01 1.43742108e+00 2.24999999e+03]
[-2.02137631e-01 8.23752242e-01 1.88172609e+00 2.24999999e+03]
[-5.04324259e-01 4.50000731e-01 1.59452219e+00 9.24999999e+03]
Code
guesses_df = pd.read_excel('data/rough_samples.xlsx', sheet_name='Guess')
guesses_df = guesses_df.set_index('Variables')
guesses_df.loc['sigma']ann1 1.5
pt2 1.4
pt2b 1.3
pt2c 1.2
pt2d 1.1
pt2e 1.0
Name: sigma, dtype: float64
5 Conclusions
- An experimental base function was obtained from a smooth wafer at different angles.
- Two background functions were proposed in order to obtain a modified function that matches the experimental rough data, a Gaussian and a Lorentzian with their corresponding parameters.
- These parameters were tuned in order to match the experimental data.
- It was observed that indeed the base function plus the background function is equivalent to ‘adding’ roughness to a smooth wafer. This was observed in an amplitude and tails change.
- The next step is to calculate an minimization function in order to minimize the error between the experimental data and the modified function.
6 Simulation WebApp
A web application including all the previous functions can be access here
